-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logger for efficient and convenient logging of objects #1343
base: p--query-resolution-perf--12_9
Are you sure you want to change the base?
Add logger for efficient and convenient logging of objects #1343
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
1 similar comment
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
4cec311
to
984906d
Compare
…convenient / efficient logging of objects.
984906d
to
b614cd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to re-implement Python's logger interface here? How do you plan to keep this consistent with the logger object interface itself? Like does this wrapper implement the full protocol spec already?
More generally, this feels like a problem with the formatter, which we're addressing by building a logger creator factory class to return a custom logger class that implements some or all of Python's logger object interface and wraps those things in order to do lazy evaluation of the format command.
Did you consider, instead, making the formatter an object with overrides for the __str__
and __repr__
methods that would produce the formatted output? That obviates the need for all this error handling (which is not without its costs) and fallback logic, it eliminates the logging factory, and it allows everybody to use the Python logger module as normal and also take advantage of a LazyFormatter or whatever in contexts where they might not, in fact, need it for logging. We might even consider using a subclass of some of Python's formatting classes for this.
I'm requesting changes here because I really don't want us to keep building custom things that are better solved with more standard implementations. If there's a compelling reason to do all of this wrapping and log redirection we should discuss it, but I don't see one described in this PR.
We don't have to. The main issues I was trying to resolve are listed in the PR description. I'm up for any solution that can address those. It sounds like you have something else in mind - see below.
Can you elaborate on the mechanism / function interface for this formatter? I considered a something like a lazy proxy, but couldn't think of a way to make it work well. |
To confirm, you read the PR description? |
Yes, I did, and my contention is that there is nothing in there to justify overriding the built in logger mechanisms with a custom thing that may not, in fact, conform to reasonable and widespread user expectations of how the logger mechanisms work. There are a number of options but I think the two most reasonable are:
I haven't looked too deeply into option 2 but it doesn't seem worth the effort. Instead, I recommend option 1 and adding a class wrapper for the format operations and deferring formatting through It's worth noting that the same rule applies to the format args passed into the log call - the logger uses printf-style argument substitution, which means any
Or:
This solves all of the problems laid out in the PR description. Instead of overriding the entire logging framework with something that will almost certainly perform worse, we simply improve our formatter to allow for lazy evaluation and swap from f-strings to %s style format strings. That migration is, I admit, annoying. But it avoids all of the debt associated with taking a common, widely used construct like the python logger and wrapping it something that maybe works the same way. I personally have no way of even evaluating how these log calls get handled if someone, say, applies a logging config to filter most of them out. Worth noting, that's something that can happen outside of MetricFlow, because the container application expects to be able to control these behaviors and they might get quite cranky if we're calling all these log methods and doing all this delegation and possibly even doing added exception handling when they thought they'd removed all of that logic from their runtime. |
Finally, for cases where we are not currently using format strings, we can go from this:
To this:
No logger framework overrides needed, all we require is mf_pformat_lazy to return the appropriately configured lazy formatter object. |
Do you mean something like
It sounds like you see something that will impact performance. What is it?
It would make no difference. The class implemented here is a wrapper that just formats strings before passing them to the standard logger.
How does this resolve ability to output the object as JSON? Regarding your approach, you're saying that your preference is to migrate to:
Instead of the proposed interface:
due to the desire to stick with the standard interface of the |
I mean more on the end user (i.e., developer) side, not runtime performance, but there is an issue with us adding yet another thread safety lock call in here. Wouldn't we be better off not doing that and delegating thread safety for logging to the object built and designed to manage it? Overall, my desire is to avoid putting object shims and custom implementations of totally standard things in the way of every developer who has to use these things on a daily basis. You've created a wrapper shim around the entire logging interface that is a glorified message formatter, and we now have to contend with it.... when? Always? Only when we call mf_pformat()? I'm not sure. Consequently, I'm proposing we make that distinction clear instead of clobbering most of the direct access functionality Python's logger affords us and hiding it behind a shim. In practice, I'd expect an interface like this:
With the user having the option of doing something like this:
What would I do when I want to emit a log as JSON (which, I'll note, the PR description says we don't want to do here, and frankly I'm not sure how this change matters for that), or alter the formatting in some other way, or add extras to a particular log call, or anything else the Python logger interface supports? I would do what one does when trying to manipulate a given Python LogRecord object - use Filter and Formatter and possibly LogHandler objects to do it, if necessary, or just pass in the supported configuration parameters and arguments to various methods to make it happen. The interface you've added doesn't give me the access I expect for those things, even though the callsites look the same. I have to wade through this wrapper, or maybe write a custom class to do handling with a different log factory. It is in fact unclear to me what I'm expected to do in these cases. |
To make sure that we're talking about the same thing - when I say output the object as JSON, the use case (not mine) that this is trying to cater to is that whenever an object is logged, a JSON representation of the object is included via the
When I hear
If the similarity is the issue, then the interface can be changed / renamed (e.g. |
That lock was added as a defensive measure against bugs in custom implementations and can be removed.
I am a developer who works with this code base heavily on a daily basis, and I am tired of writing verbose log lines. My desire is to simplify that with some convenience measures.
If you're not sure about that, I don't think you understand what's in here. I don't see how |
class MetricFlowLoggerConfiguration: | ||
"""Global object used to configure logger object creation in MF. | ||
|
||
The factory can be configured to use other implementations to realize different behaviors like passing JSON to the | ||
`extra` argument. By default, it pretty-formats the objects and appends them to the message. | ||
""" | ||
|
||
_logger: MetricFlowLoggerFactory = PrettyPrintingLoggerFactory() | ||
_state_lock = threading.Lock() | ||
|
||
@classmethod | ||
def set_logger_factory(cls, logger_factory: MetricFlowLoggerFactory) -> None: | ||
"""Set the factory to be used for creating logger objects in the `get_logger` method.""" | ||
with cls._state_lock: | ||
cls._logger = logger_factory | ||
|
||
@classmethod | ||
def get_logger(cls, module_name: str) -> MetricFlowLogger: | ||
"""Get the logger using the module where the call was made. | ||
|
||
This is a replacement for the standard `logging.getLogger` pattern. | ||
|
||
e.g. | ||
import logging | ||
logger = logging.getLogger(__name__) | ||
|
||
-> | ||
|
||
import ... | ||
logger = MetricFlowLoggerConfiguration.get_module_logger() | ||
""" | ||
with cls._state_lock: | ||
return cls._logger.get_logger(module_name) | ||
|
||
|
||
def mf_get_logger() -> MetricFlowLogger: | ||
"""Get the logger using the module where the call was made. | ||
|
||
This is a replacement for the standard `logging.getLogger` pattern. | ||
|
||
e.g. | ||
import logging | ||
logger = logging.getLogger(__name__) | ||
|
||
-> | ||
|
||
import ... | ||
logger = mf_get_logger() | ||
""" | ||
caller_frame = inspect.stack()[1] | ||
caller_module = inspect.getmodule(caller_frame[0]) | ||
|
||
if caller_module is None: | ||
raise RuntimeError(f"Unable to get module for {caller_frame[0]=}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my concern. This whole file.
What we're doing is formatting a log message based on some input parameters. That's literally it. But to do it, we have a state lock, and a factory class, and a support class shim, and a custom accessor function, and a stack inspector.
State management, stack-based accessors, and lazy evaluation of messages and message arguments are all things the built in logger class handles - a LogRecord Formatter can access the message input and the format args and operate on them as needed, etc.
Is there a problem with using a separate LogRecord Formatter to solve the issue you have with portioning things into extras? If you need different formatting from what MetricFlow is doing internally here you'll need a different formatter configuration regardless, and all of these wrappers don't make a lick of difference in terms of functionality there. Make our LazyPrinter or whatever have an API that a custom Formatter object can use to do what needs doing.
Also, the extra
parameter is just a kwarg dict that gets merged into the LogRecord attributes before formatting or routing. You can accomplish the same thing in formatting by passing in a single dict-style arg and then using it in a custom Formatter.
Basically you have a config for the MetricFlow logger objects and you set up a custom Formatter configuration that does the necessary format operations. If we're passing in objects and args instead of a pre-formatted string we can do what we like under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a problem with using a separate LogRecord Formatter to solve the issue you have with portioning things into extras?
If it was just about lazy evaluation, this is the standard approach. If you take a look at the server implementation, the log calls use the extra
field and use a separate formatter. In the PR description, I noted as one of the issues was:
Logging calls were verbose, and string keys like "matching_path" were not amenable to auto-complete in the IDE.
I got tired of having to write these dictionaries / format strings, especially when auto-complete does not help. As per PR title, one of the 3 goals of this PR was to make these calls more convenient.
What we're doing is formatting a log message based on some input parameters. That's literally it. But to do it, we have a state lock, and a factory class, and a support class shim, and a custom accessor function, and a stack inspector.
Each of those can be removed to make things either less convenient or less configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. Remove the stack accessor and change the call style from:
logger = mf_get_logger()
to
logger = mf_get_logger(__name__)
Let's focus on the problem I have with this PR: We appear to be reinventing a bunch of stuff in the name of small conveniences. When we've done this in the past it has burned us over and over and over again. Maybe I'm over-rotating on this because I'm sick of it, but I also don't see much benefit here. The small convenience, in this case, is a single call to a LazyFormatter object instead of it being hidden behind our wrapped_logger.info() call. So this:
vs this:
Something to point out about the more streamlined interface - some of those kwargs are reserved. For example, you can't log an Now, getting to the the server side call with just the formatter object takes some doing, and it's a bit hacky, but a much safer and more limited scope form of hacky than what we have in this PR. What we need to do depends quite a bit on what's required on the server side as well. So, which of these do we need?
|
I have a few issues with the comments / points, and in the interest of reducing the back and forth, let's chat about it in our sync today. |
First of all, I want to make a public apology to @plypaul for the tone and tenor of my comments. I was not in a good mood, but that's no excuse and I try to be better than that. Per our discussion, two other approaches we're considering are:
Option 1 might be something like:
Option 2 is simpler, but it's most useful if we decide to always follow this pattern of logging a simple string with all objects in extras:
Option 2 does not directly support logging the extras inline in the message as well as in extras only - that would have to be done via a Formatter class. But it means we can just use all of the standard stuff. |
@plypaul there are two other options I just landed on while writing that comment. These only really make sense if we're going to always use
I'm a bit skittish about overrides on the Logger base class because I don't fully understand everything it's doing under the covers, but if they're just re-organizing keyword arguments and then passing those through to the superclass method it's probably safe enough. |
From the docstring:
In MF, there is a need to log complex objects like DAGs and paths for debugging. Previously, the pattern for logging
those objects used a pretty-formatting method like:
However, there were a few issues:
mf_pformat_many()
call was evaluated regardless of whether the given logging level was enabled. The formatting of complex objects was found to be significant in performance profiling.The class is used in a new approach that uses a custom logging class that takes keyword arguments that are only
converted to strings when the corresponding logging level is enabled. The logger is created via a configurable
factory that can be used to implement different ways of logging those objects (e.g. JSON).
A signature for the logging calls that allows non-named arguments and uses the variable names as keys in the output
was considered for reduced verbosity:
This approach was prototyped using the
varname
library to resolve the variable name, but it was found to be tooslow (~0.7s for 1000 calls).